Introduction¶

As you learned in the previous lessons, YOLO is a state-of-the-art, real-time object detection algorithm. In this notebook, we will apply the YOLO algorithm to detect objects in images. We have provided a series of images that you can test the YOLO algorithm on.

Importing Resources¶

We will start by loading the required packages into Python. We will be using OpenCV to load our images, matplotlib to plot them, autils module that contains some helper functions, and a modified version of Darknet. YOLO uses Darknet, an open source, deep neural network framework written by the creators of YOLO. The version of Darknet used in this notebook has been modified to work in PyTorch 0.4 and has been simplified because we won't be doing any training. Instead, we will be using a set of pre-trained weights that were trained on the Common Objects in Context (COCO) database. For more information on Darknet, please visit Darknet.

In [1]:
pip install opencv-python
Collecting opencv-pythonNote: you may need to restart the kernel to use updated packages.

  Downloading opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl.metadata (20 kB)
Requirement already satisfied: numpy>=1.21.2 in c:\users\hieub\miniconda3\lib\site-packages (from opencv-python) (1.26.4)
Downloading opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)
   ---------------------------------------- 0.0/38.8 MB ? eta -:--:--
   ---------------------------------------- 0.3/38.8 MB ? eta -:--:--
    --------------------------------------- 0.5/38.8 MB 2.4 MB/s eta 0:00:16
   - -------------------------------------- 1.0/38.8 MB 2.1 MB/s eta 0:00:19
   - -------------------------------------- 1.6/38.8 MB 2.3 MB/s eta 0:00:16
   -- ------------------------------------- 2.1/38.8 MB 2.4 MB/s eta 0:00:16
   -- ------------------------------------- 2.9/38.8 MB 2.5 MB/s eta 0:00:15
   --- ------------------------------------ 3.4/38.8 MB 2.5 MB/s eta 0:00:15
   ---- ----------------------------------- 4.2/38.8 MB 2.6 MB/s eta 0:00:14
   ---- ----------------------------------- 4.7/38.8 MB 2.6 MB/s eta 0:00:13
   ----- ---------------------------------- 5.2/38.8 MB 2.7 MB/s eta 0:00:13
   ----- ---------------------------------- 5.8/38.8 MB 2.6 MB/s eta 0:00:13
   ------ --------------------------------- 6.3/38.8 MB 2.6 MB/s eta 0:00:13
   ------- -------------------------------- 6.8/38.8 MB 2.6 MB/s eta 0:00:13
   ------- -------------------------------- 7.3/38.8 MB 2.6 MB/s eta 0:00:13
   -------- ------------------------------- 8.1/38.8 MB 2.7 MB/s eta 0:00:12
   -------- ------------------------------- 8.7/38.8 MB 2.7 MB/s eta 0:00:12
   --------- ------------------------------ 9.4/38.8 MB 2.7 MB/s eta 0:00:11
   ---------- ----------------------------- 10.0/38.8 MB 2.7 MB/s eta 0:00:11
   ----------- ---------------------------- 10.7/38.8 MB 2.7 MB/s eta 0:00:11
   ----------- ---------------------------- 11.3/38.8 MB 2.7 MB/s eta 0:00:11
   ------------ --------------------------- 11.8/38.8 MB 2.7 MB/s eta 0:00:10
   ------------ --------------------------- 12.6/38.8 MB 2.8 MB/s eta 0:00:10
   ------------- -------------------------- 13.4/38.8 MB 2.8 MB/s eta 0:00:10
   -------------- ------------------------- 14.2/38.8 MB 2.8 MB/s eta 0:00:09
   --------------- ------------------------ 14.7/38.8 MB 2.8 MB/s eta 0:00:09
   --------------- ------------------------ 15.5/38.8 MB 2.9 MB/s eta 0:00:09
   ---------------- ----------------------- 16.0/38.8 MB 2.9 MB/s eta 0:00:09
   ----------------- ---------------------- 16.5/38.8 MB 2.8 MB/s eta 0:00:08
   ----------------- ---------------------- 16.8/38.8 MB 2.8 MB/s eta 0:00:08
   ----------------- ---------------------- 17.0/38.8 MB 2.7 MB/s eta 0:00:09
   ----------------- ---------------------- 17.3/38.8 MB 2.7 MB/s eta 0:00:09
   ------------------ --------------------- 17.8/38.8 MB 2.6 MB/s eta 0:00:08
   ------------------ --------------------- 18.1/38.8 MB 2.6 MB/s eta 0:00:08
   ------------------- -------------------- 18.6/38.8 MB 2.6 MB/s eta 0:00:08
   ------------------- -------------------- 19.4/38.8 MB 2.6 MB/s eta 0:00:08
   -------------------- ------------------- 20.2/38.8 MB 2.7 MB/s eta 0:00:07
   --------------------- ------------------ 20.7/38.8 MB 2.7 MB/s eta 0:00:07
   ---------------------- ----------------- 21.5/38.8 MB 2.7 MB/s eta 0:00:07
   ---------------------- ----------------- 22.3/38.8 MB 2.7 MB/s eta 0:00:07
   ----------------------- ---------------- 22.8/38.8 MB 2.7 MB/s eta 0:00:06
   ------------------------ --------------- 23.6/38.8 MB 2.7 MB/s eta 0:00:06
   ------------------------ --------------- 24.1/38.8 MB 2.7 MB/s eta 0:00:06
   ------------------------- -------------- 24.9/38.8 MB 2.7 MB/s eta 0:00:06
   -------------------------- ------------- 25.4/38.8 MB 2.7 MB/s eta 0:00:05
   -------------------------- ------------- 26.2/38.8 MB 2.7 MB/s eta 0:00:05
   --------------------------- ------------ 27.0/38.8 MB 2.8 MB/s eta 0:00:05
   ---------------------------- ----------- 27.8/38.8 MB 2.8 MB/s eta 0:00:04
   ----------------------------- ---------- 28.3/38.8 MB 2.8 MB/s eta 0:00:04
   ------------------------------ --------- 29.4/38.8 MB 2.8 MB/s eta 0:00:04
   ------------------------------ --------- 29.9/38.8 MB 2.8 MB/s eta 0:00:04
   ------------------------------- -------- 30.7/38.8 MB 2.8 MB/s eta 0:00:03
   -------------------------------- ------- 31.5/38.8 MB 2.9 MB/s eta 0:00:03
   --------------------------------- ------ 32.2/38.8 MB 2.9 MB/s eta 0:00:03
   ---------------------------------- ----- 33.0/38.8 MB 2.9 MB/s eta 0:00:03
   ---------------------------------- ----- 33.8/38.8 MB 2.9 MB/s eta 0:00:02
   ----------------------------------- ---- 34.6/38.8 MB 2.9 MB/s eta 0:00:02
   ------------------------------------ --- 35.1/38.8 MB 2.9 MB/s eta 0:00:02
   ------------------------------------- -- 36.2/38.8 MB 2.9 MB/s eta 0:00:01
   -------------------------------------- - 37.0/38.8 MB 3.0 MB/s eta 0:00:01
   -------------------------------------- - 37.7/38.8 MB 3.0 MB/s eta 0:00:01
   ---------------------------------------  38.5/38.8 MB 3.0 MB/s eta 0:00:01
   ---------------------------------------  38.8/38.8 MB 3.0 MB/s eta 0:00:01
   ---------------------------------------- 38.8/38.8 MB 2.9 MB/s eta 0:00:00
Installing collected packages: opencv-python
Successfully installed opencv-python-4.10.0.84
In [2]:
import cv2
import matplotlib.pyplot as plt
In [3]:
pip install torch
Requirement already satisfied: torch in c:\users\hieub\miniconda3\lib\site-packages (2.5.1)
Requirement already satisfied: filelock in c:\users\hieub\miniconda3\lib\site-packages (from torch) (3.16.1)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\users\hieub\appdata\roaming\python\python312\site-packages (from torch) (4.12.2)
Requirement already satisfied: networkx in c:\users\hieub\miniconda3\lib\site-packages (from torch) (3.4.2)
Requirement already satisfied: jinja2 in c:\users\hieub\miniconda3\lib\site-packages (from torch) (3.1.4)
Requirement already satisfied: fsspec in c:\users\hieub\miniconda3\lib\site-packages (from torch) (2024.10.0)
Requirement already satisfied: setuptools in c:\users\hieub\miniconda3\lib\site-packages (from torch) (75.1.0)
Requirement already satisfied: sympy==1.13.1 in c:\users\hieub\miniconda3\lib\site-packages (from torch) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\hieub\miniconda3\lib\site-packages (from sympy==1.13.1->torch) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\hieub\appdata\roaming\python\python312\site-packages (from jinja2->torch) (3.0.2)
Note: you may need to restart the kernel to use updated packages.
In [4]:
pwd()
Out[4]:
'D:\\1. Lectures\\5-PRP201c_Python programming\\2_Ipynb'
In [5]:
pip install patool
Collecting patool
  Using cached patool-3.1.0-py2.py3-none-any.whl.metadata (4.3 kB)
Using cached patool-3.1.0-py2.py3-none-any.whl (98 kB)
Installing collected packages: patool
Successfully installed patool-3.1.0
Note: you may need to restart the kernel to use updated packages.
In [10]:
import patoolib
in_dir= 'Datasets/Yolo.zip'
out_dir= 'Datasets'
patoolib.extract_archive(in_dir,outdir=out_dir)
INFO patool: Extracting Datasets/Yolo.zip ...
INFO patool: could not find a 'file' executable, falling back to guess mime type by file extension
INFO patool: ... Datasets/Yolo.zip extracted to `Datasets'.
Out[10]:
'Datasets'
In [11]:
pwd()
Out[11]:
'D:\\1. Lectures\\5-PRP201c_Python programming\\2_Ipynb'
In [13]:
%run Datasets/Yolo/utils.py
In [14]:
%run Datasets/Yolo/darknet.py
In [16]:
# Set the location and name of the cfg file
cfg_file = 'Datasets/Yolo/cfg/yolov3.cfg'

# Set the location and name of the pre-trained weights file
weight_file = 'Datasets/Yolo/weights/yolov3.weights'

# Set the location and name of the COCO object classes file
namesfile = 'Datasets/Yolo/data/coco.names'

# Load the network architecture
m = Darknet(cfg_file)

# Load the pre-trained weights
m.load_weights(weight_file)

# Load the COCO object classes
class_names = load_class_names(namesfile)
Loading weights. Please Wait...100.00% Complete
In [17]:
print(class_names)
print(len(class_names))
['person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa', 'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
80

Taking a Look at The Neural Network¶

In [18]:
# Print the neural network used in YOLOv3
m.print_network()
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32
    1 conv     64  3 x 3 / 2   416 x 416 x  32   ->   208 x 208 x  64
    2 conv     32  1 x 1 / 1   208 x 208 x  64   ->   208 x 208 x  32
    3 conv     64  3 x 3 / 1   208 x 208 x  32   ->   208 x 208 x  64
    4 shortcut 1
    5 conv    128  3 x 3 / 2   208 x 208 x  64   ->   104 x 104 x 128
    6 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64
    7 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128
    8 shortcut 5
    9 conv     64  1 x 1 / 1   104 x 104 x 128   ->   104 x 104 x  64
   10 conv    128  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x 128
   11 shortcut 8
   12 conv    256  3 x 3 / 2   104 x 104 x 128   ->    52 x  52 x 256
   13 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   14 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   15 shortcut 12
   16 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   17 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   18 shortcut 15
   19 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   20 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   21 shortcut 18
   22 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   23 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   24 shortcut 21
   25 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   26 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   27 shortcut 24
   28 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   29 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   30 shortcut 27
   31 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   32 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   33 shortcut 30
   34 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
   35 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
   36 shortcut 33
   37 conv    512  3 x 3 / 2    52 x  52 x 256   ->    26 x  26 x 512
   38 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   39 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   40 shortcut 37
   41 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   42 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   43 shortcut 40
   44 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   45 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   46 shortcut 43
   47 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   48 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   49 shortcut 46
   50 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   51 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   52 shortcut 49
   53 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   54 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   55 shortcut 52
   56 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   57 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   58 shortcut 55
   59 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   60 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   61 shortcut 58
   62 conv   1024  3 x 3 / 2    26 x  26 x 512   ->    13 x  13 x1024
   63 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   64 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   65 shortcut 62
   66 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   67 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   68 shortcut 65
   69 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   70 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   71 shortcut 68
   72 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   73 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   74 shortcut 71
   75 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   76 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   77 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   78 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   79 conv    512  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 512
   80 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   81 conv    255  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 255
   82 detection
   83 route  79
   84 conv    256  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 256
   85 upsample           * 2    13 x  13 x 256   ->    26 x  26 x 256
   86 route  85 61
   87 conv    256  1 x 1 / 1    26 x  26 x 768   ->    26 x  26 x 256
   88 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   89 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   90 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   91 conv    256  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 256
   92 conv    512  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 512
   93 conv    255  1 x 1 / 1    26 x  26 x 512   ->    26 x  26 x 255
   94 detection
   95 route  91
   96 conv    128  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 128
   97 upsample           * 2    26 x  26 x 128   ->    52 x  52 x 128
   98 route  97 36
   99 conv    128  1 x 1 / 1    52 x  52 x 384   ->    52 x  52 x 128
  100 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
  101 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
  102 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
  103 conv    128  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 128
  104 conv    256  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 256
  105 conv    255  1 x 1 / 1    52 x  52 x 256   ->    52 x  52 x 255
  106 detection

As we can see, the neural network used by YOLOv3 consists mainly of convolutional layers, with some shortcut connections and upsample layers. For a full description of this network please refer to the YOLOv3 Paper.

Loading and Resizing Our Images¶

In the code below, we load our images using OpenCV's cv2.imread() function. Since, this function loads images as BGR we will convert our images to RGB so we can display them with the correct colors.

As we can see in the previous cell, the input size of the first layer of the network is 416 x 416 x 3. Since images have different sizes, we have to resize our images to be compatible with the input size of the first layer in the network. In the code below, we resize our images using OpenCV's cv2.resize() function. We then plot the original and resized images.

In [19]:
# Set the default figure size
plt.rcParams['figure.figsize'] = [24.0, 14.0]
# Load the image
img = cv2.imread('Datasets/Yolo/images/city_scene.jpg')


# Convert the image to RGB
original_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# We resize the image to the input width and height of the first layer of the network. 
print(m.width, m.height) # m darknet
resized_image = cv2.resize(original_image, (m.width, m.height))

# Display the images
plt.subplot(121)
plt.title('Original Image')
plt.imshow(original_image)
plt.subplot(122)
plt.title('Resized Image')
plt.imshow(resized_image)
plt.show()
416 416
No description has been provided for this image

Setting the Non-Maximal Suppression Threshold¶

As you learned in the previous lessons, YOLO uses Non-Maximal Suppression (NMS) to only keep the best bounding box. The first step in NMS is to remove all the predicted bounding boxes that have a detection probability that is less than a given NMS threshold. In the code below, we set this NMS threshold to 0.6. This means that all predicted bounding boxes that have a detection probability less than 0.6 will be removed.

In [20]:
# Set the NMS threshold
nms_thresh = 0.3  

Setting the Intersection Over Union Threshold¶

After removing all the predicted bounding boxes that have a low detection probability, the second step in NMS, is to select the bounding boxes with the highest detection probability and eliminate all the bounding boxes whose Intersection Over Union (IOU) value is higher than a given IOU threshold. In the code below, we set this IOU threshold to 0.4. This means that all predicted bounding boxes that have an IOU value greater than 0.4 with respect to the best bounding boxes will be removed.

In the utils module you will find the nms function, that performs the second step of Non-Maximal Suppression, and the boxes_iou function that calculates the Intersection over Union of two given bounding boxes. You are encouraged to look at these functions to see how they work.

In [21]:
# Set the IOU threshold
iou_thresh = 0.4
nms_thresh = 0.3  

Object Detection¶

Once the image has been loaded and resized, and you have chosen your parameters for nms_thresh and iou_thresh, we can use the YOLO algorithm to detect objects in the image. We detect the objects using the detect_objects(m, resized_image, iou_thresh, nms_thresh)function from the utils module. This function takes in the model m returned by Darknet, the resized image, and the NMS and IOU thresholds, and returns the bounding boxes of the objects found.

Each bounding box contains 7 parameters: the coordinates (x, y) of the center of the bounding box, the width w and height h of the bounding box, the confidence detection level, the object class probability, and the object class id. The detect_objects() function also prints out the time it took for the YOLO algorithm to detect the objects in the image and the number of objects detected. Since we are running the algorithm on a CPU it takes about 2 seconds to detect the objects in an image, however, if we were to use a GPU it would run much faster.

Once we have the bounding boxes of the objects found by YOLO, we can print the class of the objects found and their corresponding object class probability. To do this we use the print_objects() function in the utils module.

Finally, we use the plot_boxes() function to plot the bounding boxes and corresponding object class labels found by YOLO in our image. If you set the plot_labels flag to False you will display the bounding boxes with no labels. This makes it easier to view the bounding boxes if your nms_thresh is too low. The plot_boxes()function uses the same color to plot the bounding boxes of the same object class. However, if you want all bounding boxes to be the same color, you can use the color keyword to set the desired color. For example, if you want all the bounding boxes to be red you can use:

plot_boxes(original_image, boxes, class_names, plot_labels = True, color = (1,0,0))

You are encouraged to change the iou_thresh and nms_thresh parameters to see how they affect the YOLO detection algorithm. The default values of iou_thresh = 0.4 and nms_thresh = 0.6 work well to detect objects in different kinds of images. In the cell below, we have repeated some of the code used before in order to prevent you from scrolling up down when you want to change the iou_thresh and nms_threshparameters or the image. Have Fun!

In [22]:
iou_thresh = 0.4
nms_thresh = 0.3  
# Detect objects in the image
boxes = detect_objects(m, resized_image, iou_thresh, nms_thresh)
# Print the objects found and the confidence level
print_objects(boxes, class_names)
#Plot the image with bounding boxes and corresponding object class labels
plot_boxes(original_image, boxes, class_names, plot_labels = True)

It took 2.348 seconds to detect the objects in the image.

Number of Objects Detected: 39 

Objects Found and Confidence Level:

1. person: 0.999996
2. person: 1.000000
3. car: 0.707238
4. truck: 0.933031
5. car: 0.658085
6. truck: 0.666981
7. person: 1.000000
8. traffic light: 1.000000
9. person: 1.000000
10. car: 0.997369
11. bus: 0.998023
12. person: 1.000000
13. person: 1.000000
14. person: 1.000000
15. person: 1.000000
16. person: 1.000000
17. traffic light: 1.000000
18. traffic light: 1.000000
19. handbag: 0.997282
20. traffic light: 1.000000
21. car: 0.989741
22. traffic light: 1.000000
23. traffic light: 0.999999
24. person: 0.999999
25. truck: 0.715037
26. traffic light: 1.000000
27. person: 0.999993
28. person: 0.999996
29. person: 0.999913
30. person: 0.999995
31. person: 0.999851
32. traffic light: 0.999520
33. person: 0.999997
34. traffic light: 1.000000
35. person: 0.756844
36. person: 0.967352
37. motorbike: 0.536371
38. traffic light: 0.999992
39. person: 0.999998
No description has been provided for this image
In [23]:
img = original_image.copy()
width = img.shape[1]
height = img.shape[0]

DetectedList = []
for i in range(len(boxes)):
    box = boxes[i]
    # Get the (x,y) pixel coordinates of the lower-left and lower-right corners
    # of the bounding box relative to the size of the image. 
    x1 = int(np.around((box[0] - box[2]/2.0) * width))
    y1 = int(np.around((box[1] - box[3]/2.0) * height))
    x2 = int(np.around((box[0] + box[2]/2.0) * width))
    y2 = int(np.around((box[1] + box[3]/2.0) * height))
    
    if len(box) >= 7 and class_names:
        cls_conf = box[5]
        cls_id = box[6]
        print('%i. %s: %f' % (i + 1, class_names[cls_id], float(cls_conf)))
        print(f"left : {x1} ; top : {y1} ; right : {x2} ; bottom : {y2}")
        
        d = {}
        d["objectname"] = class_names[cls_id]
        d["confident"] = float(cls_conf)
        d["pos"] = [x1, y1, x2, y2]
        DetectedList.append(d)
1. person: 0.999996
left : 4204 ; top : 2566 ; right : 4506 ; bottom : 3227
2. person: 1.000000
left : 328 ; top : 2618 ; right : 676 ; bottom : 3211
3. car: 0.707238
left : 2652 ; top : 2529 ; right : 2911 ; bottom : 2752
4. truck: 0.933031
left : 3650 ; top : 2515 ; right : 4086 ; bottom : 2891
5. car: 0.658085
left : 2283 ; top : 2533 ; right : 2577 ; bottom : 2828
6. truck: 0.666981
left : 1644 ; top : 2504 ; right : 2074 ; bottom : 2899
7. person: 1.000000
left : 3181 ; top : 2592 ; right : 3396 ; bottom : 3051
8. traffic light: 1.000000
left : 235 ; top : 1837 ; right : 374 ; bottom : 2206
9. person: 1.000000
left : 1148 ; top : 2559 ; right : 1276 ; bottom : 2853
10. car: 0.997369
left : 2036 ; top : 2582 ; right : 2311 ; bottom : 2844
11. bus: 0.998023
left : 1549 ; top : 2413 ; right : 2254 ; bottom : 2759
12. person: 1.000000
left : 886 ; top : 2561 ; right : 990 ; bottom : 2848
13. person: 1.000000
left : 1300 ; top : 2565 ; right : 1405 ; bottom : 2863
14. person: 1.000000
left : 1347 ; top : 2564 ; right : 1453 ; bottom : 2862
15. person: 1.000000
left : 768 ; top : 2557 ; right : 865 ; bottom : 2846
16. person: 1.000000
left : 4611 ; top : 2562 ; right : 4740 ; bottom : 2895
17. traffic light: 1.000000
left : 1593 ; top : 2072 ; right : 1666 ; bottom : 2251
18. traffic light: 1.000000
left : 3566 ; top : 2184 ; right : 3630 ; bottom : 2287
19. handbag: 0.997282
left : 564 ; top : 2781 ; right : 671 ; bottom : 2957
20. traffic light: 1.000000
left : 2485 ; top : 2321 ; right : 2519 ; bottom : 2378
21. car: 0.989741
left : 3469 ; top : 2599 ; right : 3643 ; bottom : 2837
22. traffic light: 1.000000
left : 2843 ; top : 2393 ; right : 2869 ; bottom : 2443
23. traffic light: 0.999999
left : 3822 ; top : 2195 ; right : 3882 ; bottom : 2283
24. person: 0.999999
left : 4477 ; top : 2560 ; right : 4588 ; bottom : 2872
25. truck: 0.715037
left : 2881 ; top : 2510 ; right : 3063 ; bottom : 2703
26. traffic light: 1.000000
left : 2468 ; top : 2226 ; right : 2508 ; bottom : 2321
27. person: 0.999993
left : 1061 ; top : 2561 ; right : 1148 ; bottom : 2840
28. person: 0.999996
left : 4134 ; top : 2580 ; right : 4356 ; bottom : 3135
29. person: 0.999913
left : 1000 ; top : 2566 ; right : 1088 ; bottom : 2837
30. person: 0.999995
left : 689 ; top : 2559 ; right : 790 ; bottom : 2882
31. person: 0.999851
left : 675 ; top : 2475 ; right : 4470 ; bottom : 2887
32. traffic light: 0.999520
left : 2792 ; top : 2438 ; right : 2820 ; bottom : 2471
33. person: 0.999997
left : 598 ; top : 2548 ; right : 716 ; bottom : 2896
34. traffic light: 1.000000
left : 4118 ; top : 1379 ; right : 4232 ; bottom : 1645
35. person: 0.756844
left : 3047 ; top : 2599 ; right : 3119 ; bottom : 2732
36. person: 0.967352
left : 294 ; top : 2630 ; right : 431 ; bottom : 3050
37. motorbike: 0.536371
left : 2571 ; top : 2582 ; right : 2631 ; bottom : 2735
38. traffic light: 0.999992
left : 1026 ; top : 2449 ; right : 1090 ; bottom : 2508
39. person: 0.999998
left : 4998 ; top : 2559 ; right : 5078 ; bottom : 2861
In [24]:
DetectedList
Out[24]:
[{'objectname': 'person',
  'confident': 0.9999955892562866,
  'pos': [4204, 2566, 4506, 3227]},
 {'objectname': 'person',
  'confident': 0.9999998807907104,
  'pos': [328, 2618, 676, 3211]},
 {'objectname': 'car',
  'confident': 0.7072378396987915,
  'pos': [2652, 2529, 2911, 2752]},
 {'objectname': 'truck',
  'confident': 0.9330310821533203,
  'pos': [3650, 2515, 4086, 2891]},
 {'objectname': 'car',
  'confident': 0.6580854058265686,
  'pos': [2283, 2533, 2577, 2828]},
 {'objectname': 'truck',
  'confident': 0.6669811010360718,
  'pos': [1644, 2504, 2074, 2899]},
 {'objectname': 'person', 'confident': 1.0, 'pos': [3181, 2592, 3396, 3051]},
 {'objectname': 'traffic light',
  'confident': 1.0,
  'pos': [235, 1837, 374, 2206]},
 {'objectname': 'person',
  'confident': 0.9999998807907104,
  'pos': [1148, 2559, 1276, 2853]},
 {'objectname': 'car',
  'confident': 0.9973688125610352,
  'pos': [2036, 2582, 2311, 2844]},
 {'objectname': 'bus',
  'confident': 0.9980230331420898,
  'pos': [1549, 2413, 2254, 2759]},
 {'objectname': 'person',
  'confident': 0.9999998807907104,
  'pos': [886, 2561, 990, 2848]},
 {'objectname': 'person', 'confident': 1.0, 'pos': [1300, 2565, 1405, 2863]},
 {'objectname': 'person',
  'confident': 0.9999995231628418,
  'pos': [1347, 2564, 1453, 2862]},
 {'objectname': 'person',
  'confident': 0.9999998807907104,
  'pos': [768, 2557, 865, 2846]},
 {'objectname': 'person',
  'confident': 0.9999997615814209,
  'pos': [4611, 2562, 4740, 2895]},
 {'objectname': 'traffic light',
  'confident': 1.0,
  'pos': [1593, 2072, 1666, 2251]},
 {'objectname': 'traffic light',
  'confident': 1.0,
  'pos': [3566, 2184, 3630, 2287]},
 {'objectname': 'handbag',
  'confident': 0.9972816705703735,
  'pos': [564, 2781, 671, 2957]},
 {'objectname': 'traffic light',
  'confident': 1.0,
  'pos': [2485, 2321, 2519, 2378]},
 {'objectname': 'car',
  'confident': 0.9897407293319702,
  'pos': [3469, 2599, 3643, 2837]},
 {'objectname': 'traffic light',
  'confident': 0.9999995231628418,
  'pos': [2843, 2393, 2869, 2443]},
 {'objectname': 'traffic light',
  'confident': 0.9999986886978149,
  'pos': [3822, 2195, 3882, 2283]},
 {'objectname': 'person',
  'confident': 0.9999992847442627,
  'pos': [4477, 2560, 4588, 2872]},
 {'objectname': 'truck',
  'confident': 0.7150365114212036,
  'pos': [2881, 2510, 3063, 2703]},
 {'objectname': 'traffic light',
  'confident': 1.0,
  'pos': [2468, 2226, 2508, 2321]},
 {'objectname': 'person',
  'confident': 0.999993085861206,
  'pos': [1061, 2561, 1148, 2840]},
 {'objectname': 'person',
  'confident': 0.9999961853027344,
  'pos': [4134, 2580, 4356, 3135]},
 {'objectname': 'person',
  'confident': 0.9999125003814697,
  'pos': [1000, 2566, 1088, 2837]},
 {'objectname': 'person',
  'confident': 0.9999953508377075,
  'pos': [689, 2559, 790, 2882]},
 {'objectname': 'person',
  'confident': 0.9998505115509033,
  'pos': [675, 2475, 4470, 2887]},
 {'objectname': 'traffic light',
  'confident': 0.999519944190979,
  'pos': [2792, 2438, 2820, 2471]},
 {'objectname': 'person',
  'confident': 0.9999966621398926,
  'pos': [598, 2548, 716, 2896]},
 {'objectname': 'traffic light',
  'confident': 0.9999996423721313,
  'pos': [4118, 1379, 4232, 1645]},
 {'objectname': 'person',
  'confident': 0.756843626499176,
  'pos': [3047, 2599, 3119, 2732]},
 {'objectname': 'person',
  'confident': 0.967352032661438,
  'pos': [294, 2630, 431, 3050]},
 {'objectname': 'motorbike',
  'confident': 0.5363708734512329,
  'pos': [2571, 2582, 2631, 2735]},
 {'objectname': 'traffic light',
  'confident': 0.999991774559021,
  'pos': [1026, 2449, 1090, 2508]},
 {'objectname': 'person',
  'confident': 0.9999980926513672,
  'pos': [4998, 2559, 5078, 2861]}]
In [25]:
import json
In [26]:
file_name = "ObjectDetection.json"
with open(file_name, "w") as fid: 
     json.dump(DetectedList, fid)
In [27]:
with open(file_name, "r") as read_file:
    data = json.load(read_file)
print(data)
[{'objectname': 'person', 'confident': 0.9999955892562866, 'pos': [4204, 2566, 4506, 3227]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [328, 2618, 676, 3211]}, {'objectname': 'car', 'confident': 0.7072378396987915, 'pos': [2652, 2529, 2911, 2752]}, {'objectname': 'truck', 'confident': 0.9330310821533203, 'pos': [3650, 2515, 4086, 2891]}, {'objectname': 'car', 'confident': 0.6580854058265686, 'pos': [2283, 2533, 2577, 2828]}, {'objectname': 'truck', 'confident': 0.6669811010360718, 'pos': [1644, 2504, 2074, 2899]}, {'objectname': 'person', 'confident': 1.0, 'pos': [3181, 2592, 3396, 3051]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [235, 1837, 374, 2206]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [1148, 2559, 1276, 2853]}, {'objectname': 'car', 'confident': 0.9973688125610352, 'pos': [2036, 2582, 2311, 2844]}, {'objectname': 'bus', 'confident': 0.9980230331420898, 'pos': [1549, 2413, 2254, 2759]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [886, 2561, 990, 2848]}, {'objectname': 'person', 'confident': 1.0, 'pos': [1300, 2565, 1405, 2863]}, {'objectname': 'person', 'confident': 0.9999995231628418, 'pos': [1347, 2564, 1453, 2862]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [768, 2557, 865, 2846]}, {'objectname': 'person', 'confident': 0.9999997615814209, 'pos': [4611, 2562, 4740, 2895]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [1593, 2072, 1666, 2251]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [3566, 2184, 3630, 2287]}, {'objectname': 'handbag', 'confident': 0.9972816705703735, 'pos': [564, 2781, 671, 2957]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [2485, 2321, 2519, 2378]}, {'objectname': 'car', 'confident': 0.9897407293319702, 'pos': [3469, 2599, 3643, 2837]}, {'objectname': 'traffic light', 'confident': 0.9999995231628418, 'pos': [2843, 2393, 2869, 2443]}, {'objectname': 'traffic light', 'confident': 0.9999986886978149, 'pos': [3822, 2195, 3882, 2283]}, {'objectname': 'person', 'confident': 0.9999992847442627, 'pos': [4477, 2560, 4588, 2872]}, {'objectname': 'truck', 'confident': 0.7150365114212036, 'pos': [2881, 2510, 3063, 2703]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [2468, 2226, 2508, 2321]}, {'objectname': 'person', 'confident': 0.999993085861206, 'pos': [1061, 2561, 1148, 2840]}, {'objectname': 'person', 'confident': 0.9999961853027344, 'pos': [4134, 2580, 4356, 3135]}, {'objectname': 'person', 'confident': 0.9999125003814697, 'pos': [1000, 2566, 1088, 2837]}, {'objectname': 'person', 'confident': 0.9999953508377075, 'pos': [689, 2559, 790, 2882]}, {'objectname': 'person', 'confident': 0.9998505115509033, 'pos': [675, 2475, 4470, 2887]}, {'objectname': 'traffic light', 'confident': 0.999519944190979, 'pos': [2792, 2438, 2820, 2471]}, {'objectname': 'person', 'confident': 0.9999966621398926, 'pos': [598, 2548, 716, 2896]}, {'objectname': 'traffic light', 'confident': 0.9999996423721313, 'pos': [4118, 1379, 4232, 1645]}, {'objectname': 'person', 'confident': 0.756843626499176, 'pos': [3047, 2599, 3119, 2732]}, {'objectname': 'person', 'confident': 0.967352032661438, 'pos': [294, 2630, 431, 3050]}, {'objectname': 'motorbike', 'confident': 0.5363708734512329, 'pos': [2571, 2582, 2631, 2735]}, {'objectname': 'traffic light', 'confident': 0.999991774559021, 'pos': [1026, 2449, 1090, 2508]}, {'objectname': 'person', 'confident': 0.9999980926513672, 'pos': [4998, 2559, 5078, 2861]}]
In [28]:
display_image = original_image.copy()

# Create a figure and plot the image
fig, a = plt.subplots(1,1)
a.imshow(display_image)

for obj in data:
    if(obj["objectname"] in ["car", "truck"]):
        print(obj)
        # Calculate the width and height of the bounding box relative to the size of the image.
        x1, y1, x2, y2 = obj["pos"]
        width_x = x2 - x1
        width_y = y1 - y2
        # Set the postion and size of the bounding box. (x1, y2) is the pixel coordinate of the
        # lower-left corner of the bounding box relative to the size of the image.
        rect = patches.Rectangle((x1, y2),
                                 width_x, width_y,
                                 linewidth = 2,
                                 edgecolor = 'r',
                                 facecolor = 'none')
        # Draw the bounding box on top of the image
        a.add_patch(rect)
        # Create a string with the object class name and the corresponding object class probability
        conf_tx = obj["objectname"] + ': {:.1f}'.format(obj["confident"])

        # Define x and y offsets for the labels
        lxc = (img.shape[1] * 0.266) / 100
        lyc = (img.shape[0] * 1.180) / 100

        # Draw the labels on top of the image
        a.text(x1 + lxc, y1 - lyc, conf_tx, fontsize = 24, color = 'k',
               bbox = dict(facecolor = 'b', edgecolor = 'b', alpha = 0.8))    
plt.show()
{'objectname': 'car', 'confident': 0.7072378396987915, 'pos': [2652, 2529, 2911, 2752]}
{'objectname': 'truck', 'confident': 0.9330310821533203, 'pos': [3650, 2515, 4086, 2891]}
{'objectname': 'car', 'confident': 0.6580854058265686, 'pos': [2283, 2533, 2577, 2828]}
{'objectname': 'truck', 'confident': 0.6669811010360718, 'pos': [1644, 2504, 2074, 2899]}
{'objectname': 'car', 'confident': 0.9973688125610352, 'pos': [2036, 2582, 2311, 2844]}
{'objectname': 'car', 'confident': 0.9897407293319702, 'pos': [3469, 2599, 3643, 2837]}
{'objectname': 'truck', 'confident': 0.7150365114212036, 'pos': [2881, 2510, 3063, 2703]}
No description has been provided for this image

Exercise 2 (optional): Replace the image used to identify the object with a different image of your choice.¶

In [ ]: